Similar efficacy of ibrutinib arms across ALPINE and ELEVATE-RR trials in relapsed/refractory chronic lymphocytic leukemia: a matching-adjusted indirect comparison


 Background: Bruton tyrosine kinase inhibitors (BTKi) are currently widely used for the treatment of patients with chronic lymphocytic leukemia (CLL). Ibrutinib, the first BTKi approved for the treatment of CLL, was followed by the second-generation BTKi, acalabrutinib, and recently the next-generation BTKi, zanubrutinib. Both zanubrutinib and acalabrutinib were compared to ibrutinib in phase 3 randomized controlled trials in relapsed/refractory (R/R) CLL. In the ALPINE trial (NCT03734016), zanubrutinib demonstrated a superior progression-free survival (PFS) when compared with ibrutinib in the all-comer R/R CLL population with hazard ratio (HR)=0.65, whereas the ELEVATE-RR trial (NCT02477696) showed noninferior PFS of acalabrutinib vs ibrutinib in R/R CLL patients with the presence of del(17p) or del(11q) with HR=1. Recent attempts to compare the efficacy results of the ibrutinib arm across trials omitted some patient characteristics that are critical for appropriate cross-trial comparisons. This study aimed to compare the efficacy of the ibrutinib control arm across ALPINE and ELEVATE-RR trials using a comprehensive matching-adjusted indirect comparison (MAIC).
 Methods: Individual patient data from the ibrutinib arm of ALPINE were adjusted to match the published population-level profile from the ibrutinib arm of ELEVATE-RR. To obtain comparable populations for MAIC, a subgroup of patients from ALPINE was included in the analysis. An unanchored MAIC was conducted to adjust for all relevant treatment effect modifiers (EM). The following were considered for population adjustment: IGHV status, del17p, del11q, TP53 status, serum β2-microglobulin, number of prior therapies, and Binet stage. Additional prognostic factors (PF) were also adjusted in sensitivity analyses. ALPINE data cutoff of August 2022 was used given the availability of both independent review committee (IRC) and investigator (INV) assessed data, and the possibility of a comparison vs other recently published MAICs (median follow-up: 29.6 months). Efficacy of ibrutinib in ALPINE was compared with efficacy of ibrutinib in ELEVATE-RR (median follow-up: 40.9 months). After population adjustment, HR obtained by weighted Cox proportional hazard model was applied to assess PFS and overall survival (OS) outcomes. PFS was analyzed as per IRC and INV. As the ALPINE trial was conducted during the COVID-19 pandemic and ELEVATE-RR was not, sensitivity analysis was conducted by adjusting the ALPINE PFS and OS for COVID-19 impact by censoring the patients who died due to COVID-19 at the most recent disease assessment prior to death or at the death due to COVID-19.
 Results: The high-risk population in ALPINE included 123 patients in the ibrutinib arm, which were matched against 265 patients in the ibrutinib arm of the ELEVATE-RR trial. After population adjustment, no statistically significant differences were observed in ALPINE-ibrutinib vs ELEVATE-ibrutinib with regards to PFS-IRC (HR=0.80 [0.49-1.28], P=0.3485) (Figure 1), PFS-INV (HR=1.18 [0.75-1.86], P=0.4827) (Figure 2), and OS (HR=0.91 [0.50-1.65], P=0.7539). Sensitivity analysis with COVID-19 adjustment yielded similar results as the main analysis. Scenarios matching for both EM and PF also generated results consistent with the main analysis.
 Conclusion: Using a comprehensive list of matching variables, this MAIC compares the performance of ibrutinib across ALPINE and ELEVATE-RR trials and demonstrates no evidence of a difference. Comparing the common comparator arms of 2 trials (ibrutinib vs ibrutinib) instead of the different investigational arms (zanubrutinib vs acalabrutinib) allows for eliminating some of the residual confounding that is inherent in MAICs. Despite decreased estimated sample size due to considering a comprehensive list of variables in the adjustment, results were consistent across multiple scenarios tested. While MAIC provides a basis for testing hypotheses with regards to treatment efficacy across trials, the ultimate evidence of relative efficacy must be sought within randomized controlled trials.


Dear Editor,
Two different randomized controlled trials (RCTs) have compared, head-to-head, the efficacy and safety of Bruton tyrosine kinase inhibitors (BTKis) in chronic lymphocytic leukemia (CLL); in both these studies, the first-generation BTKi ibrutinib was used as the comparator arm.ELEVATE-RR (NCT02477696), a multicenter, randomized, open-label, noninferiority phase 3 trial, compared acalabrutinib vs. ibrutinib in patients with previously treated, highrisk [presence of del(17p) and/or del(11q)] CLL [1].In this study, acalabrutinib met its primary endpoint of progression-free survival (PFS) noninferiority (hazard ratio [HR]:1.0;95% confidence interval [CI], 0.79-1.27)with a median PFS of 38.4 months in both arms.Acalabrutinib demonstrated improved tolerability with fewer cardiovascular adverse events (AEs) vs. ibrutinib.ALPINE (NCT03734016) was a global, randomized, open-label phase 3 trial designed to assess the superiority of zanubrutinib over ibrutinib in patients with relapsed/refractory (R/R) CLL or small lymphocytic lymphoma [2][3][4].In the ALPINE intent-to-treat population, zanubrutinib demonstrated superior PFS compared with ibrutinib when assessed by either an independent review committee (IRC) or by the investigator (INV) [2].In high-risk patients with del(17p)/TP53 mutation, as well as across other major subgroups, PFS favored zanubrutinib.Furthermore, zanubrutinib had an improved safety profile compared with ibrutinib with a lower rate of treatment discontinuation and fewer cardiac disorder events, including fewer deaths.
Comparison of ibrutinib arms across separate trials can be made using matching-adjusted indirect comparison (MAIC) methodology, where individual patient-level data (IPD) from one trial are combined with published aggregate data from another trial, followed by propensity score weighting.Baseline characteristics of patients with IPD are weighted, and IPD are reanalyzed to match outcome definitions in the aggregate data [5].A recent indirect comparison of the ibrutinib arms across the ALPINE, ELEVATE-RR, and RESONATE (ibrutinib vs. ofatumumab) trials using MAIC methodology implied that ibrutinib underperformed in ALPINE [6].The analysis matched key patient baseline characteristics including age ≥75 years, bulky disease, prior treatments, β 2 -microglobulin, and del(11q) or del(17p) status but omitted other characteristics critical for appropriate cross-trial comparisons, such as sex, TP53 and immunoglobulin heavy chain variable (IGHV) mutation status, complex karyotype, and Binet stage.
The present study compared the efficacy of the ibrutinib arms across the ALPINE and ELEVATE-RR trials using MAIC methodology and a more comprehensive list of matching variables to address the underperformance of ibrutinib within ALPINE reported by Ghia et al. [6].As there was no common comparator between ALPINE and ELEVATE-RR when comparing the efficacy of the ibrutinib arms, this study used an unanchored MAIC, which was conducted inline with published recommendations [5].The ALPINE ibrutinib arm IPD (N = 325) were filtered to include patients who met the inclusion criteria of ELEVATE-RR (i.e., R/R CLL with del(17p) or del(11q) deletions).The resulting sample (N = 123) was reweighted to align the distribution of relevant effect modifiers (EMs) and prognostic factors (PFs) with published aggregate data for the ibrutinib arm of ELEVATE-RR (N = 265) [1,2].Weights were determined using propensity scores.The MAIC was designed to adjust for all relevant EMs and PFs, which were identified based on a review of the impact of different subgroups analyzed in previous CLL trials and confirmed with clinical experts.The selected parameters for propensity score weighting in the base case were del(17p), del(11q), TP53 mutation status, IGHV mutation status, serum β2-microglobulin, number of prior therapies, and Binet stage.Re-weighted IPD were used to calculate adjusted efficacy outcomes in ALPINE.Weighted HRs were estimated to compare PFS-IRC, PFS-INV, and overall survival (OS) between the ibrutinib arms in ALPINE and ELEVATE-RR.Pseudo IPD of time to event outcomes for the ibrutinib arm of ELEVATE-RR were reconstructed from Kaplan-Meier curves reported in the ELEVATE-RR publication using the algorithm by Guyot et al. [7].HRs of time to event outcomes were estimated from a weighted Cox model (i.e., comparing weighted ibrutinib ALPINE data against the pseudo IPD of ibrutinib in ELEVATE-RR).Nominal p values were reported for descriptive purposes.
Sensitivity analyses were performed to assess the robustness of the base case results.In the first sensitivity analysis, additional EMs and PFs, including age, sex, complex karyotype, bulky disease, and Eastern Cooperative Oncology Group Performance Status were adjusted.In a second sensitivity analysis, ALPINE PFS and OS were adjusted for COVID-19 impact, as ALPINE was conducted during the COVID-19 period and ELEVATE-RR follow-up data (included in this analysis) were mostly collected before the COVID pandemic.This was achieved by censoring the patients who died due to COVID-19 at the most recent disease assessment prior to death or at the death due to COVID-19.
Baseline characteristics of the populations before matching and a comprehensive summary of EMs and PFs adjusted in the base case and the sensitivity analyses are summarized in Table 1A.Matching the two populations reduced the effective sample size (ESS) from 123 to 63 in the base case analysis.
The base case PFS-IRC, PFS-INV, and OS for the ibrutinib arms of ALPINE and ELEVATE-RR are shown in Fig. 1 and Table 1B.After matching (median follow-up, 28.4 months), no statistically significant differences were observed in PFS-IRC (HR [95%     1B).While no significant differences were observed between the efficacy outcomes in the ibrutinib arms of ALPINE and ELEVATE-RR, ibrutinib in ALPINE showed numerical "overperformance" compared to ELEVATE-RR with regards to PFS-IRC and OS.This trend could be observed for the base case and sensitivity analyses, where the HRs of PFS-IRC and OS for the ibrutinib arms of ALPINE vs. ELEVATE-RR were always below 1.These observations highlight the importance of considering both PFS-IRC and PFS-INV for unanchored MAICs, where available, as the conclusions may change when using different PFS measurements.However, given that both ALPINE and ELEVATE-RR were open-label trials, PFS-IRC is a preferred endpoint [8,9].
Findings from the present study contrast with the results of the previous MAIC [6].The MAIC results from Ghia et al. showed that PFS and overall response rate outcomes for ibrutinib were consistent between RESONATE and ELEVATE-RR but ibrutinib "underperformed" in ALPINE.Findings here demonstrate an equivalence.The disparate findings between the present and previous study may be attributed to differences in the EMs and PFs adjusted for in the MAIC analyses.Several important patient characteristics such as sex, IGHV mutation status, TP53 mutation status, complex karyotype, and Binet stage were not considered in the Ghia study.Presence of complex karyotype, advanced Binet stage, unmutated IGHV, del(11q), and TP53 abnormalities are highrisk markers for CLL [10].Failure to appropriately identify and select EMs and PFs in MAICs may result in biased or uncertain effect estimates, impacting the validity of the analysis [11].
Indirect treatment comparisons such as MAICs provide useful information on the comparative efficacy of treatments evaluated in separate trials, potentially filling evidence gaps for health technology assessments [5,12].However, due to limitations (modeling assumptions and cross-trial differences in baseline characteristics) and confounding associated with these methodologies, MAIC analyses cannot replace the gold standard of RCTs, should be interpreted with caution, and be viewed as observational and hypothesis-generating [5,13].
Like any other MAIC, this study had some limitations.Notably, the ESS of the ibrutinib arm in ALPINE was reduced to 63 after filtering out the non-high-risk patients and conducting the matching and adjustment.The study was by nature limited to the high-risk ALPINE population, which reduced the starting sample size.The ESS was further decreased as all important baseline characteristics were considered for accurate comparisons.Despite the small ESS, results were consistent across the multiple sensitivity analyses tested.
The present study did not evaluate the efficacy of the ibrutinib arm of RESONATE.Given both ELEVATE-RR and ALPINE are more contemporary trials that compare a next-generation BTKi to ibrutinib, ELEVATE-RR was deemed more suitable for this comparison.We would expect ibrutinib to perform slightly better in RESONATE compared to ALPINE, potentially due to (1) the difference between RESONATE and ALPINE with regards to geographic distribution of patients and (2) ibrutinib was the only BTKi available in clinical trials at the time of RESONATE, with the only alternatives being standard of care chemotherapies, possibly leading to enhanced adherence.
In conclusion, this MAIC used a comprehensive list of matching variables to compare the efficacy of the ibrutinib arms in ALPINE and ELEVATE-RR, showing no significant difference in the performance of ibrutinib across the two trials.Results were robust in all sensitivity analyses.While MAICs provide a basis for hypothesis generation with regards to treatment efficacy across trials, they are not a substitute for head-to-head RCTs, as they cannot balance all observable and unobservable differences at baseline.Consequently, ultimate evidence of relative efficacy must be sought within RCTs.
Baseline characteristics before matching and after adjustment for the ibrutinib arms of ALPINE and ELEVATE-RR (A) and PFS-IRC, PFS-INV, and OS (B) in the base case and sensitivity analyses.A. Baseline characteristics of ibrutinib arms in ALPINE and ELEVATE-RR before matching, and adjustment for EMs and PFs in base case and sensitivity analyses Population case and sensitivity analyses adjusting for COVID-19 impact Model Adjustment for COVID?Ibrutinib ESS PFS-INV HR a

Fig. 1
Fig. 1 Survival outcomes.A PFS-IRC.B PFS-INV.C OS. a Given the availability of both IRC-and INV-assessed data.CI confidence interval, CLL chronic lymphocytic leukemia, COVID-19 coronavirus disease-19, EM effect modifier, HR hazard ratio, INV investigator, IPD individual patientlevel data, IRC independent review committee, MAIC matching-adjusted indirect comparison, ORR overall response rate, OS overall survival, PF prognostic factor, PFS progression-free survival, RCT randomized clinical trial, R/R relapsed refractory, SLL small lymphocytic lymphoma.